Identity Matching Based on Probabilistic Relational Models
نویسندگان
چکیده
Identity management is critical to various organizational practices ranging from citizen services to crime investigation. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. In this study we propose a probabilistic relational model (PRM) based approach to match identities in databases. By exploring a database relational structure, we derive three categories of features, namely personal identity features, social activity features, and social relationship features. Based on these derived features, a probabilistic prediction model can be constructed to make a matching decision on a pair of identities. An experimental study using a real criminal dataset demonstrates the effectiveness of the proposed PRM-based approach. By incorporating social activity features, the average precision of identity matching increased from 53.73 % to 54.64%; furthermore, the incorporation of social relation features increased the average precision to 68.27%.
منابع مشابه
Identity Uncertainty and Citation Matching
Identity uncertainty is a pervasive problem in real-world data analysis. It arises whenever objects are not labeled with unique identifiers or when those identifiers may not be perceived perfectly. In such cases, two observations may or may not correspond to the same object. In this paper, we consider the problem in the context of citation matching—the problem of deciding which citations corres...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملStructural Matching in Computer Vision Using Probabilistic Relaxation
In this paper, we develop the theory of probabilistic relaxation for matching features extracted from 2D images, derive as limiting cases the various heuristic formulae used by researchers in matching problems, and state the conditions under which they apply. We successfully apply our theory to the problem of matching and recognizing aerial road network images based on road network models and t...
متن کاملOn the Connections between Relational and XML Probabilistic Data Models
A number of uncertain data models have been proposed, based on the notion of compact representations of probability distributions over possible worlds. In probabilistic relational models, tuples are annotated with probabilities or formulae over Boolean random variables. In probabilistic XML models, XML trees are augmented with nodes that specify probability distributions over their children. Bo...
متن کامل